Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Analysis of large-scale distributed machine learning systems: a case study on LDA
TANG Lizhe, FENG Dawei, LI Dongsheng, LI Rongchun, LIU Feng
Journal of Computer Applications    2017, 37 (3): 628-634.   DOI: 10.11772/j.issn.1001-9081.2017.03.628
Abstract924)      PDF (1169KB)(568)       Save
Aiming at the problems of scalability, algorithm convergence performance and operational efficiency in building large-scale machine learning systems, the challenges of the large-scale sample, model and network communication to the machine learning system were analyzed and the solutions of the existing systems were also presented. Taking Latent Dirichlet Allocation (LDA) model as an example, by comparing three open source distributed LDA systems-Spark LDA, PLDA+ and LightLDA, the differences in system design, implementation and performance were analyzed in terms of system resource consumption, algorithm convergence performance and scalability. The experimental results show that the memory usage of LightLDA and PLDA+ is about half of Spark LDA, and the convergence speed is 4 to 5 times of Spark LDA in the face of small sample sets and models. In the case of large-scale sample sets and models, the network communication volume and system convergence time of LightLDA is much smaller than PLDA+ and SparkLDA, showing a good scalability. The model of "data parallelism+model parallelism" can effectively meet the challenge of large-scale sample and model. The mechanism of Stale Synchronous Parallel (SSP) model for parameters, local caching mechanism of model and sparse storage of parameter can reduce the network cost effectively and improve the system operation efficiency.
Reference | Related Articles | Metrics